Setup

Background

Recount3 contains over 70,000 uniformly processed human RNA-seq samples. Recount provides gene, exon and exon-exon junction count matrices both in text format and as a RangedSummarizedExperiment.

The reads from recount were algined with the splice-aware Rail-RNA aligner. To compute the gene count matrices, the mapped reads were quantified with Gencode v25 with hg38 coordinates.

Unlike traditional quantification methods, recount3 provides base-pair coverage counts. Essentially, these are created in the following manner:

Datasets

Ramakar et al (2017)

Metdata preprocessing

  • Selection of metadata variables for subset

##                                                        V1
## RunID                                         -0.92868104
## SampleID                                       0.55372437
## SampleACC                                      0.51017130
## ExperimentACC                                  0.51057124
## ExperimentTitle                                0.51057124
## SampleAttributes                              -0.72738923
## ExperimentAttributes                           0.51057124
## SampleName                                     0.51057124
## SampleTitle                                   -1.23250559
## SampleBases                                    1.11527919
## SampleSpots                                    1.11527919
## RunPublished                                   0.54456759
## Size                                           1.12694331
## RunTotalBases                                  1.11527919
## RunTotalSpots                                  1.11527919
## NumSpots                                       1.11527919
## ReadInfo                                       0.38058013
## RunAlias                                       0.55372437
## ChimericPairs                                  0.01076623
## Percent_aligned_ChrX                          -1.12734087
## Percent_aligned_ChrY                          -1.23402102
## AUC_all_alignments                             1.15151841
## AUC_all_annotated_exons                        1.14631544
## AUC_uniquely_aligned                           1.14576839
## AUC_all_annotated_exons_unique                 1.14411532
## AUC_all_percentage                            -0.21444988
## AUC_unique_percentage                         -0.34046719
## TotalNFragments                                1.15118136
## ReadFragmentLength                             1.15582171
## MeanFragmentLength                            -0.76348074
## MeanFragmentLength_BAM                        -1.45914025
## ModeFragmentLength                            -1.37374875
## ModeFragmentLengthCount                        0.98259768
## Percentage_fragment_mapped_exon_fc            -0.84445720
## Percentage_fragment_mapped_unique_exon_fc     -0.94177028
## Total_fragments_input_fc_exon_fc               1.14873291
## Total_fragments_assigned_exon_fc               1.07236766
## Total_fragments_count_unique_exon_fc           1.14873291
## Total_fragments_count_unique_assigned_exon_fc  1.07236766
## Percentage_fragment_mapped_gene_fc            -0.77542433
## Percentage_fragment_mapped_unique_gene_fc     -0.90055582
## Total_fragments_input_fc_gene_fc               1.14873291
## Total_fragments_assigned_gene_fc               1.08135938
## Total_fragments_count_unique_gene_fc           1.14873291
## Total_fragments_count_unique_assigned_gene_fc  1.07082942
## IntronTotal                                    0.71638493
## IntronicRate                                  -0.89960724
## Percentage_chimeric_reads_STAR                -0.60238847
## Percentage_mapped_multi_loci_STAR             -1.08440659
## Percentage_mapped_too_many_loci_STAR          -0.82383860
## Percentage_unmapped_other_STAR                -0.57680385
## Percentage_unmapped_too_short_STAR            -1.88467814
## ReadsMapped                                    1.15118136
## Average_mapped_length_STAR                    -0.32566596
## Deletion_average_length_STAR                  -0.55906864
## Deteltion_rate_per_base_STAR                  -0.29576566
## Insertion_average_length_STAR                 -0.55574228
## Insertion_rate_per_base_STAR                  -1.51974581
## Mapping_speed_per_hour_STAR                   -0.96668977
## Percentage_mismatch_per_base_STAR             -1.74721034
## Number_of_chimeric_reads_STAR                  1.09888346
## TotalNReads                                    1.11527919
## Number_reads_mapped_to_multiple_loci_STAR      0.95339808
## Number_reads_mapped_to_too_many_loci_STAR      0.80898014
## Number_reads_unmapped_other_STAR               0.52528344
## Number_reads_unmapped_too_short_STAR          -0.53600615
## Number_canonical_splices_AT_AC_STAR            0.73608655
## Number_canonical_splices_GC_AG_STAR            0.75971354
## Number_canonical_splices_GT_AG_STAR            0.75900426
## Number_non_canonical_splices_STAR              1.02989622
## Number_splices_total_STAR                      0.76152217
## MappingRate                                   -0.25675886
## MappingRate_unique                             1.14539241
## Junction_count                                 0.69893170
## Junction_coverage                              0.76855901
## Junction_average_coverage                      0.68260096
## Number_input_reads_both_STAR                   1.11527919
## All_mapped_reads_both_STAR                     1.15118136
## Number_chimeric_reads_both_STAR                1.09888346
## Number_reads_mapped_multiple_loci_both_STAR    0.95339808
## Numner_reads_mapped_too_many_loci_both_STAR    0.80898014
## Number_reads_unmapped_other_both_STAR          0.52528344
## Number_reads_unmapped_too_short_both_STAR     -0.53600615
## Number_reads_mapped_uniquely_both_STAR         1.14539241
## MappingRate_both                              -0.30697213
## Percent_Chimeric_both                         -0.60168883
## Percentage_mapped_multi_loci_both_STAR        -1.08362058
## Percentage_mapped_too_many_loci_both_STAR     -0.78002945
## Percentage_unmapped_other_both_STAR           -0.71012681
## Percentage_unmapped_too_short_both_STAR       -1.88483368
## Percentage_uniquely_mapped_both_STAR          -0.25660163
## DIstinctQualityValues                         -0.39411799
## Percent_Bases                                  1.11527919
## Percent_A                                      0.31485922
## Percent_C                                     -2.27390093
## Percent_G                                     -2.29695804
## Percent_T                                      0.15663928
## Percent_N                                     -1.64457035
## Average_Phred                                 -0.33586526
## ErrQ                                          -0.50472935
## SampleAccPrediction                            0.51017130
## PredictionType                                -1.86320912
## BigWigFile                                    -1.24533672
## Age                                           -0.74812676
## StructureAcronym                              -1.06065904
## Diagnosis                                     -1.04369569
## Ethnicity                                     -0.71326591
## Sex                                           -1.29032706
## PMI                                           -1.03429880
## Regions                                       -1.08274401
## Age_rounded                                   -0.73698637
## AgeInterval                                   -0.74829756

Expression matrix preprocessing

Variance partition

Human developmental biology resource

From recount3, I have also retrieved the dataset from the Human Developmental Biology Resource (HDBR) which contains the largest resource of prenatal samples.

Metadata processing

  • Selection of metadata variables for subset

##                                                        V1
## SampleID                                      -0.32358296
## SequencingBatch                               -0.65771931
## Age                                           -1.16477300
## DonorID                                       -1.01590434
## Karyotype                                     -0.88228800
## Structure                                     -0.26930854
## Hemisphere                                    -1.01001065
## AgeInterval                                   -1.13264158
## RunID                                         -0.94598292
## SampleACC                                     -0.32256113
## ExperimentACC                                 -0.32256113
## SampleDescription                             -0.97881805
## LibraryName                                   -0.51207166
## SampleAttributes                              -0.51207166
## ExperimentAttributes                          -0.27294169
## SampleName                                    -0.32256113
## SampleTitle                                   -0.51207166
## SampleBases                                    1.58257583
## SampleSpots                                    1.58257583
## RunPublished                                  -0.32382229
## Size                                           0.76779749
## RunTotalBases                                  1.58257583
## RunTotalSpots                                  1.58257583
## NumSpots                                       1.58257583
## ReadInfo                                       0.98324962
## RunAlias                                      -0.51458739
## ChimericPairs                                 -0.74389630
## Percent_aligned_ChrX                          -0.90686770
## Percent_aligned_ChrY                          -0.85182354
## AUC_all_alignments                             1.56908916
## AUC_all_annotated_exons                        1.51539878
## AUC_uniquely_aligned                           1.50287780
## AUC_all_annotated_exons_unique                 1.48349310
## AUC_all_percentage                            -0.86292955
## AUC_unique_percentage                         -0.83599638
## TotalNFragments                                1.57252958
## ReadFragmentLength                             1.53686952
## MeanFragmentLength                            -0.90832665
## MeanFragmentLength_BAM                        -0.88151528
## ModeFragmentLength                            -0.73265629
## ModeFragmentLengthCount                        1.21843722
## Percentage_fragment_mapped_exon_fc            -0.75248373
## Percentage_fragment_mapped_unique_exon_fc     -0.72609221
## Total_fragments_input_fc_exon_fc               0.91056357
## Total_fragments_assigned_exon_fc               1.39127438
## Total_fragments_count_unique_exon_fc           0.91056357
## Total_fragments_count_unique_assigned_exon_fc  1.39127438
## Percentage_fragment_mapped_gene_fc            -0.74740724
## Percentage_fragment_mapped_unique_gene_fc     -0.72130741
## Total_fragments_input_fc_gene_fc               0.91056357
## Total_fragments_assigned_gene_fc               1.40646528
## Total_fragments_count_unique_gene_fc           0.91056357
## Total_fragments_count_unique_assigned_gene_fc  1.39263511
## IntronTotal                                   -0.02259468
## IntronicRate                                  -1.16056427
## Percentage_chimeric_reads_STAR                -0.96713136
## Percentage_mapped_multi_loci_STAR             -0.91640144
## Percentage_mapped_too_many_loci_STAR          -0.88585112
## Percentage_unmapped_other_STAR                -0.66264819
## Percentage_unmapped_too_short_STAR            -0.72552264
## ReadsMapped                                    1.57252958
## Average_mapped_length_STAR                    -0.99723418
## Deletion_average_length_STAR                  -1.12150560
## Deteltion_rate_per_base_STAR                  -0.88564118
## Insertion_average_length_STAR                 -1.04854167
## Insertion_rate_per_base_STAR                  -0.80300388
## Mapping_speed_per_hour_STAR                   -0.42546013
## Percentage_mismatch_per_base_STAR             -0.65104321
## Number_of_chimeric_reads_STAR                  0.51459421
## TotalNReads                                    1.58257583
## Number_reads_mapped_to_multiple_loci_STAR      0.05385478
## Number_reads_mapped_to_too_many_loci_STAR      0.22555765
## Number_reads_unmapped_other_STAR              -0.45844057
## Number_reads_unmapped_too_short_STAR           0.27134682
## Number_canonical_splices_AT_AC_STAR            0.99768947
## Number_canonical_splices_GC_AG_STAR            0.97053742
## Number_canonical_splices_GT_AG_STAR            1.13249222
## Number_non_canonical_splices_STAR              0.95070503
## Number_splices_total_STAR                      1.13375178
## MappingRate                                   -0.97424225
## MappingRate_unique                             1.52496573
## Junction_count                                 0.17104979
## Junction_coverage                              1.10759446
## Junction_average_coverage                      0.96123990
## Number_input_reads_both_STAR                   1.58257583
## All_mapped_reads_both_STAR                     1.57252958
## Number_chimeric_reads_both_STAR                0.51459421
## Number_reads_mapped_multiple_loci_both_STAR    0.05385478
## Numner_reads_mapped_too_many_loci_both_STAR    0.22555765
## Number_reads_unmapped_other_both_STAR         -0.45844057
## Number_reads_unmapped_too_short_both_STAR      0.27134682
## Number_reads_mapped_uniquely_both_STAR         1.52496573
## MappingRate_both                              -1.12274349
## Percent_Chimeric_both                         -1.02047475
## Percentage_mapped_multi_loci_both_STAR        -0.91517758
## Percentage_mapped_too_many_loci_both_STAR     -0.86324132
## Percentage_unmapped_other_both_STAR           -0.66959375
## Percentage_unmapped_too_short_both_STAR       -0.72218240
## Percentage_uniquely_mapped_both_STAR          -0.97387363
## DIstinctQualityValues                         -1.24737783
## Percent_Bases                                  1.58257583
## Percent_A                                     -1.03063158
## Percent_C                                     -0.90708642
## Percent_G                                     -0.80138311
## Percent_T                                     -0.91433115
## Percent_N                                     -1.00877817
## Average_Phred                                 -0.85788026
## ErrQ                                          -0.70773003
## SampleAccPrediction                           -0.32256113
## BigWigFile                                    -0.86811510

Expression matrix preprocessing

Variance partition analysis